Dataset info
| Number of variables | 6 |
|---|---|
| Number of observations | 2935849 |
| Missing cells | 0 (0.0%) |
| Duplicate rows | 6 (< 0.1%) |
| Total size in memory | 134.4 MiB |
| Average record size in memory | 48.0 B |
Variables types
| Numeric | 5 |
|---|---|
| Categorical | 1 |
| Boolean | 0 |
| Date | 0 |
| URL | 0 |
| Text (Unique) | 0 |
| Rejected | 0 |
| Unsupported | 0 |
Warnings
| Dataset has 6 (< 0.1%) duplicate rows | Warning |
date only contains datetime values, but is categorical. Consider applying pd.to_datetime() | Type |
date has a high cardinality: 1034 distinct values | Warning |
date_block_num has 115690 (3.9%) zeros | Zeros |
item_cnt_day is highly skewed (γ1 = 272.8331617) | Skewed |
date
Categorical
| Distinct count | 1034 |
|---|---|
| Unique (%) | < 0.1% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| 28.12.2013 | 9434 |
|---|---|
| 29.12.2013 | 9335 |
| 30.12.2014 | 9324 |
| Other values (1031) |
| Value | Count | Frequency (%) | |
| 28.12.2013 | 9434 | 0.3% | |
| 29.12.2013 | 9335 | 0.3% | |
| 30.12.2014 | 9324 | 0.3% | |
| 30.12.2013 | 9138 | 0.3% | |
| 31.12.2014 | 8347 | 0.3% | |
| 27.12.2014 | 8041 | 0.3% | |
| 31.12.2013 | 7765 | 0.3% | |
| 23.02.2013 | 7577 | 0.3% | |
| 28.12.2014 | 7370 | 0.3% | |
| 21.12.2013 | 6773 | 0.2% | |
| Other values (1024) | 2852745 | 97.2% |
| Max length | 10 |
|---|---|
| Mean length | 10 |
| Min length | 10 |
| Contains chars | False |
| Contains digits | True |
| Contains spaces | False |
| Contains non-words | True |
date_block_num
Numeric
| Distinct count | 34 |
|---|---|
| Unique (%) | < 0.1% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 14.56991146 |
|---|---|
| Minimum | 0 |
| Maximum | 33 |
| Zeros (%) | 3.9% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 7 |
| Median | 14 |
| Q3 | 23 |
| 95-th percentile | 31 |
| Maximum | 33 |
| Range | 33 |
| Interquartile range | 16 |
Descriptive statistics
| Standard deviation | 9.422987709 |
|---|---|
| Coef of variation | 0.6467429629 |
| Kurtosis | -1.082868996 |
| Mean | 14.56991146 |
| MAD | 8.119015654 |
| Skewness | 0.2038579466 |
| Sum | 42775060 |
| Variance | 88.79269736 |
| Memory size | 22.4 MiB |
Histogram with fixed size bins (bins=34)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 29.5 30.5 31.5 32.5 33. ], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 11 | 143246 | 4.9% | |
| 23 | 130786 | 4.5% | |
| 2 | 121347 | 4.1% | |
| 0 | 115690 | 3.9% | |
| 1 | 108613 | 3.7% | |
| 7 | 104772 | 3.6% | |
| 6 | 100548 | 3.4% | |
| 5 | 100403 | 3.4% | |
| 12 | 99349 | 3.4% | |
| 10 | 96736 | 3.3% | |
| Other values (24) | 1814359 | 61.8% |
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0 | 115690 | 3.9% | |
| 1 | 108613 | 3.7% | |
| 2 | 121347 | 4.1% | |
| 3 | 94109 | 3.2% | |
| 4 | 91759 | 3.1% |
Maximum 5 values
| Value | Count | Frequency (%) | |
| 33 | 53514 | 1.8% | |
| 32 | 50588 | 1.7% | |
| 31 | 57029 | 1.9% | |
| 30 | 55549 | 1.9% | |
| 29 | 54617 | 1.9% |
item_cnt_day
Numeric
| Distinct count | 198 |
|---|---|
| Unique (%) | < 0.1% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 1.242640885 |
|---|---|
| Minimum | -22 |
| Maximum | 2169 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | -22 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| Median | 1 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 2169 |
| Range | 2191 |
| Interquartile range | 0 |
Descriptive statistics
| Standard deviation | 2.618834431 |
|---|---|
| Coef of variation | 2.107474864 |
| Kurtosis | 177478.0988 |
| Mean | 1.242640885 |
| MAD | 0.4459868445 |
| Skewness | 272.8331617 |
| Sum | 3648206 |
| Variance | 6.858293776 |
| Memory size | 22.4 MiB |
Histogram with fixed size bins (bins=50)
Histogram with variable size bins (bins=[-2.200e+01 -5.500e+00 -2.500e+00 -1.500e+00 0.000e+00 ... 1.105e+02 1.565e+02 2.595e+02 6.530e+02 2.169e+03], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 1 | 2629372 | 89.6% | |
| 2 | 194201 | 6.6% | |
| 3 | 47350 | 1.6% | |
| 4 | 19685 | 0.7% | |
| 5 | 10474 | 0.4% | |
| -1 | 7252 | 0.2% | |
| 6 | 6338 | 0.2% | |
| 7 | 4057 | 0.1% | |
| 8 | 2903 | 0.1% | |
| 9 | 2177 | 0.1% | |
| Other values (188) | 12040 | 0.4% |
Minimum 5 values
| Value | Count | Frequency (%) | |
| -22 | 1 | < 0.1% | |
| -16 | 1 | < 0.1% | |
| -9 | 1 | < 0.1% | |
| -6 | 2 | < 0.1% | |
| -5 | 4 | < 0.1% |
Maximum 5 values
| Value | Count | Frequency (%) | |
| 2169 | 1 | < 0.1% | |
| 1000 | 1 | < 0.1% | |
| 669 | 1 | < 0.1% | |
| 637 | 1 | < 0.1% | |
| 624 | 1 | < 0.1% |
item_id
Numeric
| Distinct count | 21807 |
|---|---|
| Unique (%) | 0.7% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 10197.22706 |
|---|---|
| Minimum | 0 |
| Maximum | 22169 |
| Zeros (%) | < 0.1% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1540 |
| Q1 | 4476 |
| Median | 9343 |
| Q3 | 15684 |
| 95-th percentile | 20949 |
| Maximum | 22169 |
| Range | 22169 |
| Interquartile range | 11208 |
Descriptive statistics
| Standard deviation | 6324.297354 |
|---|---|
| Coef of variation | 0.6201977575 |
| Kurtosis | -1.225209966 |
| Mean | 10197.22706 |
| MAD | 5579.673443 |
| Skewness | 0.2571735482 |
| Sum | 2.993751886e+10 |
| Variance | 39996737.02 |
| Memory size | 22.4 MiB |
Histogram with fixed size bins (bins=50)
Histogram with variable size bins (bins=[ 0. 26.5 27.5 28.5 29.5 ... 22164.5 22165.5 22166.5 22167.5 22169. ], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 20949 | 31340 | 1.1% | |
| 5822 | 9408 | 0.3% | |
| 17717 | 9067 | 0.3% | |
| 2808 | 7479 | 0.3% | |
| 4181 | 6853 | 0.2% | |
| 7856 | 6602 | 0.2% | |
| 3732 | 6475 | 0.2% | |
| 2308 | 6320 | 0.2% | |
| 4870 | 5811 | 0.2% | |
| 3734 | 5805 | 0.2% | |
| Other values (21797) | 2840689 | 96.8% |
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0 | 1 | < 0.1% | |
| 1 | 6 | < 0.1% | |
| 2 | 2 | < 0.1% | |
| 3 | 2 | < 0.1% | |
| 4 | 1 | < 0.1% |
Maximum 5 values
| Value | Count | Frequency (%) | |
| 22169 | 1 | < 0.1% | |
| 22168 | 6 | < 0.1% | |
| 22167 | 1114 | < 0.1% | |
| 22166 | 270 | < 0.1% | |
| 22165 | 2 | < 0.1% |
item_price
Numeric
| Distinct count | 19993 |
|---|---|
| Unique (%) | 0.7% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 890.8532327 |
|---|---|
| Minimum | -1 |
| Maximum | 307980 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | -1 |
|---|---|
| 5-th percentile | 99 |
| Q1 | 249 |
| Median | 399 |
| Q3 | 999 |
| 95-th percentile | 2690 |
| Maximum | 307980 |
| Range | 307981 |
| Interquartile range | 750 |
Descriptive statistics
| Standard deviation | 1729.799631 |
|---|---|
| Coef of variation | 1.941733573 |
| Kurtosis | 445.5328258 |
| Mean | 890.8532327 |
| MAD | 769.9530494 |
| Skewness | 10.7504227 |
| Sum | 2615410572 |
| Variance | 2992206.762 |
| Memory size | 22.4 MiB |
Histogram with fixed size bins (bins=50)
Histogram with variable size bins (bins=[-1.00000000e+00 9.50000000e-02 1.50000000e-01 3.50000000e-01 7.04356846e-01 ... 3.27400000e+04 3.29937500e+04 3.59905000e+04 4.63860000e+04 3.07980000e+05], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 299 | 291352 | 9.9% | |
| 399 | 242603 | 8.3% | |
| 149 | 218432 | 7.4% | |
| 199 | 184044 | 6.3% | |
| 349 | 101461 | 3.5% | |
| 599 | 95673 | 3.3% | |
| 999 | 82784 | 2.8% | |
| 799 | 77882 | 2.7% | |
| 249 | 77685 | 2.6% | |
| 699 | 76493 | 2.6% | |
| Other values (19983) | 1487440 | 50.7% |
Minimum 5 values
| Value | Count | Frequency (%) | |
| -1 | 1 | < 0.1% | |
| 0.07 | 2 | < 0.1% | |
| 0.0875 | 1 | < 0.1% | |
| 0.09 | 1 | < 0.1% | |
| 0.1 | 2932 | 0.1% |
Maximum 5 values
| Value | Count | Frequency (%) | |
| 307980 | 1 | < 0.1% | |
| 59200 | 1 | < 0.1% | |
| 50999 | 1 | < 0.1% | |
| 49782 | 1 | < 0.1% | |
| 42990 | 4 | < 0.1% |
shop_id
Numeric
| Distinct count | 60 |
|---|---|
| Unique (%) | < 0.1% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 33.00172829 |
|---|---|
| Minimum | 0 |
| Maximum | 59 |
| Zeros (%) | 0.3% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 6 |
| Q1 | 22 |
| Median | 31 |
| Q3 | 47 |
| 95-th percentile | 57 |
| Maximum | 59 |
| Range | 59 |
| Interquartile range | 25 |
Descriptive statistics
| Standard deviation | 16.22697305 |
|---|---|
| Coef of variation | 0.4917007044 |
| Kurtosis | -1.025358056 |
| Mean | 33.00172829 |
| MAD | 13.83000446 |
| Skewness | -0.07236142921 |
| Sum | 96888091 |
| Variance | 263.3146543 |
| Memory size | 22.4 MiB |
Histogram with fixed size bins (bins=50)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 3.5 5.5 ... 55.5 56.5 57.5 58.5 59. ], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 31 | 235636 | 8.0% | |
| 25 | 186104 | 6.3% | |
| 54 | 143480 | 4.9% | |
| 28 | 142234 | 4.8% | |
| 57 | 117428 | 4.0% | |
| 42 | 109253 | 3.7% | |
| 27 | 105366 | 3.6% | |
| 6 | 82663 | 2.8% | |
| 58 | 71441 | 2.4% | |
| 56 | 69573 | 2.4% | |
| Other values (50) | 1672671 | 57.0% |
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0 | 9857 | 0.3% | |
| 1 | 5678 | 0.2% | |
| 2 | 25991 | 0.9% | |
| 3 | 25532 | 0.9% | |
| 4 | 38242 | 1.3% |
Maximum 5 values
| Value | Count | Frequency (%) | |
| 59 | 42108 | 1.4% | |
| 58 | 71441 | 2.4% | |
| 57 | 117428 | 4.0% | |
| 56 | 69573 | 2.4% | |
| 55 | 34769 | 1.2% |
First rows
| date | date_block_num | item_cnt_day | item_id | item_price | shop_id | |
|---|---|---|---|---|---|---|
| 0 | 02.01.2013 | 0 | 1.0 | 22154 | 999.00 | 59 |
| 1 | 03.01.2013 | 0 | 1.0 | 2552 | 899.00 | 25 |
| 2 | 05.01.2013 | 0 | -1.0 | 2552 | 899.00 | 25 |
| 3 | 06.01.2013 | 0 | 1.0 | 2554 | 1709.05 | 25 |
| 4 | 15.01.2013 | 0 | 1.0 | 2555 | 1099.00 | 25 |
| 5 | 10.01.2013 | 0 | 1.0 | 2564 | 349.00 | 25 |
| 6 | 02.01.2013 | 0 | 1.0 | 2565 | 549.00 | 25 |
| 7 | 04.01.2013 | 0 | 1.0 | 2572 | 239.00 | 25 |
| 8 | 11.01.2013 | 0 | 1.0 | 2572 | 299.00 | 25 |
| 9 | 03.01.2013 | 0 | 3.0 | 2573 | 299.00 | 25 |
Last rows
| date | date_block_num | item_cnt_day | item_id | item_price | shop_id | |
|---|---|---|---|---|---|---|
| 2935839 | 24.10.2015 | 33 | 1.0 | 7315 | 399.0 | 25 |
| 2935840 | 31.10.2015 | 33 | 1.0 | 7409 | 299.0 | 25 |
| 2935841 | 11.10.2015 | 33 | 1.0 | 7393 | 349.0 | 25 |
| 2935842 | 10.10.2015 | 33 | 1.0 | 7384 | 749.0 | 25 |
| 2935843 | 09.10.2015 | 33 | 1.0 | 7409 | 299.0 | 25 |
| 2935844 | 10.10.2015 | 33 | 1.0 | 7409 | 299.0 | 25 |
| 2935845 | 09.10.2015 | 33 | 1.0 | 7460 | 299.0 | 25 |
| 2935846 | 14.10.2015 | 33 | 1.0 | 7459 | 349.0 | 25 |
| 2935847 | 22.10.2015 | 33 | 1.0 | 7440 | 299.0 | 25 |
| 2935848 | 03.10.2015 | 33 | 1.0 | 7460 | 299.0 | 25 |